Improving Speaker Diarization

نویسندگان

  • Claude Barras
  • Xuan Zhu
  • Sylvain Meignier
  • Jean-Luc Gauvain
چکیده

This paper describes the LIMSI speaker diarization system used in the RT-04F evaluation. The RT-04F system builds upon the LIMSI baseline data partitioner, which is used in the broadcast news transcription system. This partitioner provides a high cluster purity but has a tendency to split the data from a speaker into several clusters when there is a large quantity of data for the speaker. In the RT-03S evaluation the baseline partitioner had a 24.5% diarization error rate. Several improvements to the baseline diarization system have been made. A standard Bayesian information criterion (BIC) agglomerative clustering has been integrated replacing the iterative Gaussian mixture model (GMM) clustering; a local BIC criterion is used for comparing single Gaussians with full covariance matrices. A second clustering stage has been added, making use of a speaker identification method: maximum a posteriori adaptation of a reference GMM with 128 Gaussians. A final post-processing stage refines the segment boundaries using the output of the transcription system. Compared to the best configuration baseline system for this task, the improved system reduces the speaker error time by over 75% on the development data. On evaluation data, a 8.5% overall diarization error rate was obtained, a 60% reduction in error compared to the baseline.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An iterative speaker re-diarization scheme for improving speaker-based entity extraction in multimedia archives

In this paper we present a novel scheme for improving speaker diarization by making use of repeating speakers across multiple recordings within a large corpus. We call this technique speaker re-diarization and demonstrate that it is possible to reuse the initial speaker-linked diarization outputs to boost diarization accuracy within individual recordings. We first propose and evaluate two novel...

متن کامل

Two's a crowd: improving speaker diarization by automatically identifying and excluding overlapped speech

We present an update to our initial work [1] on overlapped speech detection for improving speaker diarization. Specifically, we describe the addition of new features and feature warping techniques that improve segmenter and, consequently, diarization performance. We also demonstrate improved diarization performance by additionally using overlap segment information in a new diarization pre-proce...

متن کامل

Online two speaker diarization

Short conversations pose some challenges for online diarization due to data sparseness and unbalanced representation of the two speakers. This paper presents our recent advances in online diarization of two-wire telephone conversations, introducing several methods for improving processing efficiency and accuracy on short conversations. Our framework is based on the offline diarization of a conv...

متن کامل

Improving speaker diarization for CHIL lecture meetings

Speaker diarization is often performed before automatic speech recognition (ASR) to label speaker segments. In this paper we present two simple schemes to improve the speaker diarization performance. The first is to iteratively refine GMM speaker models by frame level re-labeling and smoothing of the decision likelihood. The second is to use word level alignment information from the ASR process...

متن کامل

Integration of TDOA features in information bottleneck framework for fast speaker diarization

In this paper we address the combination of multiple feature streams in a fast speaker diarization system for meeting recordings. Whenever Multiple Distant Microphones (MDM) are used, it is possible to estimate the Time Delay of Arrival (TDOA) for different channels. In [9], it is shown that TDOA can be used as additional features together with conventional spectral features for improving speak...

متن کامل

Improving speaker segmentation via speaker identification and text segmentation

Speaker segmentation is an essential part of a speaker diarization system. Common segmentation systems usually miss speaker change points when speakers switch fast. These errors seriously confuse the following speaker clustering step and result in high overall speaker diarization error rates. In this paper two methods are proposed to deal with this problem: The first approach uses speaker ident...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004